A data lake is a storage repository that holds a huge amount of raw data as it was generated, while it is still not
necessary to process it. In general, data lakes store unstructured data but they can combine different kinds of data.
The data lake is part of the Collector component, as it stores the raw data received from the data sources. For that
reason, it is important that it is aligned with the defined requirements. Indeed, this data lake can be better managed
if we use metadata to try to tackle the problematics of having a huge amount of disorganized data.
References:
-
N. Miloslavskaya and A. Tolstoy, ‘Application of big data, fast data, and data lake concepts to
information security issues’, presented at the Proceedings - 2016 4th International Conference on Future Internet
of Things and Cloud Workshops, W-FiCloud 2016, 2016, pp. 148–153.
-
C. Diamantini, P. L. Giudice, L. Musarella, D. Potena, E. Storti, and D. Ursino, ‘A new metadata
model to uniformly handle heterogeneous data lake sources’, Commun. Comput. Inf. Sci., vol. 909, pp. 165–177,
2018.
-
|